Pattern { Based Clustering for Database Attribute Values Matthew
نویسندگان
چکیده
Pattern{Based Clustering for Database Attribute Values Matthew Merzbacher Wesley W. Chu Computer Science Department University of California Los Angeles, CA 90024 Abstract We present a method for automatically clustering similar attribute values in a database system spanning mulitple domains. The method constructs an attribute abstraction hierarchy for each attribute using rules that are derived from the database instance. The rules have a con dence and popularity that combine to express the \usefullness" of the rule. Attribute values are clustered if they are used as the premise for rules with the same consequence. By iteratively applying the algorithm, a hierarchy of clusters can be found. The algorithm can be improved by allowing domain expert supervision during the clustering process. An example as well as experimental results from a large transportation database are included.
منابع مشابه
Pattern-Based Clustering for Database Attribute Values
We present a method for automatically clustering similar attribute values in a database system spanning mulitple domains. The method constructs an aftribute abstraction hierarchy for each attribute using rules that are derived from the database instance. The rules have a confidence and popularity that combine to express the "usefullness" of the rule. Attribute values are clustered if they are u...
متن کاملA Database Model for Medical Consultation
The database model presented in this paper is suitable for application in which queries may require non-crisp references to certain attributes. The data item (attribute) values may be crisp or fuzzy. For instance, such adjectives as 'high' or 'normal' may be attribute values for the attribute blood pressure. A disease or a condition can be described by a number of symptoms which may be crisp al...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملA Novel Technique for Pattern Extraction in Mixed Data
Knowledge discovery in databases or data mining is an important issue in the development of data and knowledge base system. The Self Organizing Map (SOM) is a vector quantization method which places the prototype vectors on a regular lowdimensional grid in an ordered fashion. Clustering data and extracting patterns from the clusters are very important tasks in data mining. An attribute-oriented...
متن کاملA Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کامل